灾难性的遗忘是阻碍在持续学习环境中部署深度学习算法的一个重大问题。已经提出了许多方法来解决灾难性的遗忘问题,在学习新任务时,代理商在旧任务中失去了其旧任务的概括能力。我们提出了一项替代策略,可以通过知识合并(CFA)处理灾难性遗忘,该策略从多个专门从事以前任务的多个异构教师模型中学习了学生网络,并可以应用于当前的离线方法。知识融合过程以单头方式进行,只有选定数量的记忆样本,没有注释。教师和学生不需要共享相同的网络结构,可以使异质任务适应紧凑或稀疏的数据表示。我们将我们的方法与不同策略的竞争基线进行比较,证明了我们的方法的优势。
translated by 谷歌翻译
A crucial issue of current text generation models is that they often uncontrollably generate factually inconsistent text with respective of their inputs. Limited by the lack of annotated data, existing works in evaluating factual consistency directly transfer the reasoning ability of models trained on other data-rich upstream tasks like question answering (QA) and natural language inference (NLI) without any further adaptation. As a result, they perform poorly on the real generated text and are biased heavily by their single-source upstream tasks. To alleviate this problem, we propose a weakly supervised framework that aggregates multiple resources to train a precise and efficient factual metric, namely WeCheck. WeCheck first utilizes a generative model to accurately label a real generated sample by aggregating its weak labels, which are inferred from multiple resources. Then, we train the target metric model with the weak supervision while taking noises into consideration. Comprehensive experiments on a variety of tasks demonstrate the strong performance of WeCheck, which achieves a 3.4\% absolute improvement over previous state-of-the-art methods on TRUE benchmark on average.
translated by 谷歌翻译
A digital twin is defined as a virtual representation of a physical asset enabled through data and simulators for real-time prediction, optimization, monitoring, controlling, and improved decision-making. Unfortunately, the term remains vague and says little about its capability. Recently, the concept of capability level has been introduced to address this issue. Based on its capability, the concept states that a digital twin can be categorized on a scale from zero to five, referred to as standalone, descriptive, diagnostic, predictive, prescriptive, and autonomous, respectively. The current work introduces the concept in the context of the built environment. It demonstrates the concept by using a modern house as a use case. The house is equipped with an array of sensors that collect timeseries data regarding the internal state of the house. Together with physics-based and data-driven models, these data are used to develop digital twins at different capability levels demonstrated in virtual reality. The work, in addition to presenting a blueprint for developing digital twins, also provided future research directions to enhance the technology.
translated by 谷歌翻译
We study the performance of monolingual and multilingual language models on the task of question-answering (QA) on three diverse languages: English, Finnish and Japanese. We develop models for the tasks of (1) determining if a question is answerable given the context and (2) identifying the answer texts within the context using IOB tagging. Furthermore, we attempt to evaluate the effectiveness of a pre-trained multilingual encoder (Multilingual BERT) on cross-language zero-shot learning for both the answerability and IOB sequence classifiers.
translated by 谷歌翻译
Large Language Models are affected by the phenomena of memorizing and forgetting their training data. But how do these vary by model size? We work towards this question by investigating how the model size affects the model's ability to discriminate a word's meaning in a given context. We introduce a dataset called DeltaWords, which evaluates a model's ability to follow instructions to select a sentence which replaces the target word with its antonym. We show a weak inverse scaling trend, where task accuracy degrades as model size increase, under extremely few-shot prompting regimes. We show that increasing the number of examples tend to disproportionately benefit larger models than smaller models.
translated by 谷歌翻译
This letter focuses on the task of Multi-Target Multi-Camera vehicle tracking. We propose to associate single-camera trajectories into multi-camera global trajectories by training a Graph Convolutional Network. Our approach simultaneously processes all cameras providing a global solution, and it is also robust to large cameras unsynchronizations. Furthermore, we design a new loss function to deal with class imbalance. Our proposal outperforms the related work showing better generalization and without requiring ad-hoc manual annotations or thresholds, unlike compared approaches.
translated by 谷歌翻译
Video-and-language pre-training has shown promising results for learning generalizable representations. Most existing approaches usually model video and text in an implicit manner, without considering explicit structural representations of the multi-modal content. We denote such form of representations as structural knowledge, which express rich semantics of multiple granularities. There are related works that propose object-aware approaches to inject similar knowledge as inputs. However, the existing methods usually fail to effectively utilize such knowledge as regularizations to shape a superior cross-modal representation space. To this end, we propose a Cross-modaL knOwledge-enhanced Pre-training (CLOP) method with Knowledge Regularizations. There are two key designs of ours: 1) a simple yet effective Structural Knowledge Prediction (SKP) task to pull together the latent representations of similar videos; and 2) a novel Knowledge-guided sampling approach for Contrastive Learning (KCL) to push apart cross-modal hard negative samples. We evaluate our method on four text-video retrieval tasks and one multi-choice QA task. The experiments show clear improvements, outperforming prior works by a substantial margin. Besides, we provide ablations and insights of how our methods affect the latent representation space, demonstrating the value of incorporating knowledge regularizations into video-and-language pre-training.
translated by 谷歌翻译
基于物理学的模型已成为流体动力学的主流,用于开发预测模型。近年来,由于数据科学,处理单元,基于神经网络的技术和传感器适应性的快速发展,机器学习为流体社区提供了复兴。到目前为止,在流体动力学中的许多应用中,机器学习方法主要集中在标准过程上,该过程需要将培训数据集中在指定机器或数据中心上。在这封信中,我们提出了一种联合机器学习方法,该方法使本地化客户能够协作学习一个汇总和共享的预测模型,同时将所有培训数据保留在每个边缘设备上。我们证明了这种分散学习方法的可行性和前景,并努力为重建时空领域建立深度学习的替代模型。我们的结果表明,联合机器学习可能是设计与流体动力学相关的高度准确预测分散的数字双胞胎的可行工具。
translated by 谷歌翻译
准确的牙齿体积分割是计算机辅助牙齿分析的先决条件。基于深度学习的牙齿分割方法已经达到了令人满意的表现,但需要大量的牙齿数据。公开可用的牙科数据是有限的,这意味着无法在临床实践中复制,评估和应用现有方法。在本文中,我们建立了一个3D Dental CBCT数据集Ctooth+,具有22个完全注释的卷和146个未标记的体积。我们进一步评估了基于完全监督的学习,半监督学习和积极学习的几种最先进的牙齿量细分策略,并定义了绩效原则。这项工作为牙齿体积分割任务提供了新的基准,该实验可以作为未来基于AI的牙科成像研究和临床应用开发的基线。
translated by 谷歌翻译
预处理的变形金刚记住事实知识的能力对于下游任务(例如封闭式问题答案)是必不可少的。现有的工作表明,经过审计的变压器可以回忆或利用在某种程度上出现的训练训练阶段中出现的事实知识。但是,由于模型能力的限制,预审预周仔的记忆知识的能力也受到限制。 Dai等。 (2022)发现经过验证的变形金刚中的馈电网络(FFN)以内存的方式存储事实知识。受这一发现的启发,我们提出了一个神经知识库(NKB),以存储预验证的变压器的额外事实知识。要具体而言,我们还将FFN视为键值记忆,并使用其他内存插槽扩展它们。在知识注入期间,我们将原始模型和事实知识注入扩展的存储插槽中,因此预验证的模型不会遗忘。此外,FFN作为钥匙值记忆的观点使NKB高度可解释。我们使用三个封闭式问题回答数据集来显示我们强大的存储额外事实知识的能力。另外,我们证明NKB不会通过两种代表性生成任务,摘要和机器翻译来降低验证模型的一般语言生成能力。此外,我们彻底分析了NKB以揭示其工作机制,并以人为可读的方式介绍其钥匙和价值观的含义。最重要的是,我们执行初步尝试,以直接更新NKB中的事实知识,而无需任何其他培训。
translated by 谷歌翻译